NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs

https://doi.org/10.1007/s00778-024-00867-8

Yang, Yifei; Yu, Xiangyao; Serafini, Marco; Aboulnaga, Ashraf; Stonebraker, Michael (September 2024, The VLDB Journal)

Modern cloud-native OLAP databases adopt a storage-disaggregation architecture that separates the management of compu- tation and storage. A major bottleneck in such an architecture is the network connecting the computation and storage layers. Computation pushdown is a promising solution to tackle this issue, which offloads some computation tasks to the storage layer to reduce network traffic. This paper presents FlexPushdownDB (FPDB), where we revisit the design of computation pushdown in a storage-disaggregation architecture, and then introduce several optimizations to further accelerate query pro- cessing. First, FPDB supports hybrid query execution, which combines local computation on cached data and computation pushdown to cloud storage at a fine granularity. Within the cache, FPDB uses a novel Weighted-LFU cache replacement policy that takes into account the cost of pushdown computation. Second, we design adaptive pushdown as a new mecha- nism to avoid throttling the storage-layer computation during pushdown, which pushes the request back to the computation layer at runtime if the storage-layer computational resource is insufficient. Finally, we derive a general principle to identify pushdown-amenable computational tasks, by summarizing common patterns of pushdown capabilities in existing systems, and further propose two new pushdown operators, namely, selection bitmap and distributed data shuffle. Evaluation on SSB and TPC-H shows each optimization can improve the performance by 2.2×, 1.9×, and 3× respectively.
more » « less
Full Text Available
Take Out the TraChe: Maximizing (Tra)nsactional Ca(che) Hit Rate

Cheng, Audrey; Chu, David; Li, Terrance; Chan, Jason; Crooks, Natacha; Hellerstein, Joseph M; Stoica, Ion; Yu, Xiangyao (July 2023, USENIX Association)

Most caching policies focus on increasing object hit rate to improve overall system performance. However, these algorithms are insufficient for transactions. In this work, we define a new metric, transactional hit rate, to capture when caching reduces latency for transactions. We present DeToX, a caching system that leverages transactional dependencies to make eviction and prefetching decisions. DeToX is able to significantly outperform single-object alternatives on real-world workloads and popular OLTP benchmarks, providing up to a 130% increase in transaction hit rate and 3.4x improvement in cache efficiency.
more » « less
Full Text Available
Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs

https://doi.org/10.1145/3589980

Lin, Jiaxin; Ji, Tao; Hao, Xiangpeng; Cha, Hokeun; Le, Yanfang; Yu, Xiangyao; Akella, Aditya (May 2023, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

The wide adoption of the emerging SmartNIC technology creates new opportunities to offload application-level computation into the networking layer, which frees the burden of host CPUs, leading to performance improvement. Shuffle, the all-to-all data exchange process, is a critical building block for network communication in distributed data-intensive applications and can potentially benefit from SmartNICs. In this paper, we develop SmartShuffle, which accelerates the data-intensive application's shuffle process by offloading various computation tasks into the SmartNIC devices. SmartShuffle supports offloading both low-level network functions, including data partitioning and network transport, and high-level computation tasks, including filtering, aggregation, and sorting. SmartShuffle adopts a coordinated offload architecture to make sender-side and receiver-side SmartNICs jointly contribute to the benefits of shuffle computation offload. SmartShuffle carefully manages the tight and time-varying computation and memory constraints on the device. We propose a liquid offloading approach, which dynamically migrates operators between the host CPU and the SmartNIC at runtime such that resources in both devices are fully utilized. We prototype SmartShuffle on the Stingray SoC SmartNICs and plug it into Spark. Our evaluation shows that SmartShuffle improves host CPU efficiency and I/O efficiency with lower job completion time. SmartShuffle outperforms Spark, and Spark RDMA by up to 40% on TPC-H.
more » « less
Full Text Available
Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMS

https://doi.org/10.14778/3551793.3551809

Yogatama, Bobbi W.; Gong, Weiwei; Yu, Xiangyao (July 2022, Proceedings of the VLDB Endowment)

There has been a growing interest in using GPU to accelerate data analytics due to its massive parallelism and high memory bandwidth. The main constraint of using GPU for data analytics is the limited capacity of GPU memory. Heterogeneous CPU-GPU query execution is a compelling approach to mitigate the limited GPU memory capacity and PCIe bandwidth. However, the design space of heterogeneous CPU-GPU query execution has not been fully explored. We aim to improve state-of-the-art CPU-GPU data analytics engine by optimizing data placement and heterogeneous query execution. First, we introduce a semantic-aware fine-grained caching policy which takes into account various aspects of the workload such as query semantics, data correlation, and query frequency when determining data placement between CPU and GPU. Second, we introduce a heterogeneous query executor which can fully exploit data in both CPU and GPU and coordinate query execution at a fine granularity. We integrate both solutions in Mordred, our novel hybrid CPU-GPU data analytics engine. Evaluation on the Star Schema Benchmark shows that the semantic-aware caching policy can outperform the best traditional caching policy by up to 3x. Compared to existing GPU DBMSs, Mordred can outperform by an order of magnitude.
more » « less
Full Text Available
Tile-based Lightweight Integer Compression in GPU

https://doi.org/10.1145/3514221.3526132

Shanbhag, Anil; Yogotama, Bobbi; Yu, Xiangyao; Madden, Samuel (June 2022, Proceedings of the 2022 International Conference on Management of Data (SIGMOD ’22))

Full Text Available
Litmus: Towards a Practical Database Management System with Verifiable ACID Properties and Transaction Correctness

https://doi.org/10.1145/3514221.3517851

Xia, Yu; Yu, Xiangyao; Butrovich, Matthew; Pavlo, Andrew; Devadas, Srinivas (June 2022, Proceedings of the 2022 International Conference on Management of Data)

Full Text Available
ASAP: A Speculative Approach to Persistence

https://doi.org/10.1109/HPCA53966.2022.00070

Yadalam, Sujay; Shah, Nisarg; Yu, Xiangyao; Swift, Michael (April 2022, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Persistent memory enables a new class of applications that have persistent in-memory data structures. Recoverability of these applications imposes constraints on the ordering of writes to persistent memory. But, the cache hierarchy and memory controllers in modern systems may reorder writes to persistent memory. Therefore, programmers have to use expensive flush and fence instructions that stall the processor to enforce such ordering. While prior efforts circumvent stalling on long latency flush instructions, these designs under-perform in large-scale systems with many cores and multiple memory controllers.We propose ASAP, an architectural model in which the hardware takes an optimistic approach by persisting data eagerly, thereby avoiding any ordering stalls and utilizing the total system bandwidth efficiently. ASAP avoids stalling by allowing writes to be persisted out-of-order, speculating that all writes will eventually be persisted. For correctness, ASAP saves recovery information in the memory controllers which is used to undo the effects of speculative writes to memory in the event of a crash.Over a large number of representative workloads, ASAP improves performance over current Intel systems by 2.3 on average and performs within 3.9% of an ideal system.
more » « less
Full Text Available
How Good is My HTAP System?

https://doi.org/10.1145/3514221.3526148

Milkai, Elena; Chronis, Yannis; Gaffney, Kevin P.; Guo, Zhihan; Patel, Jignesh M.; Yu, Xiangyao (June 2022, SIGMOD)

Full Text Available
FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS

https://doi.org/10.14778/3476249.3476265

Yang, Yifei; Youill, Matt; Woicik, Matthew; Liu, Yizhou; Yu, Xiangyao; Serafini, Marco; Aboulnaga, Ashraf; Stonebraker, Michael (July 2021, Proceedings of the VLDB Endowment)
null (Ed.)
Modern cloud databases adopt a storage-disaggregation architecture that separates the management of computation and storage. A major bottleneck in such an architecture is the network connecting the computation and storage layers. Two solutions have been explored to mitigate the bottleneck: caching and computation pushdown. While both techniques can significantly reduce network traffic, existing DBMSs consider them as orthogonal techniques and support only one or the other, leaving potential performance benefits unexploited. In this paper we present FlexPushdownDB (FPDB), an OLAP cloud DBMS prototype that supports fine-grained hybrid query execution to combine the benefits of caching and computation pushdown in a storage-disaggregation architecture. We build a hybrid query executor based on a new concept called separable operators to combine the data from the cache and results from the pushdown processing. We also propose a novel Weighted-LFU cache replacement policy that takes into account the cost of pushdown computation. Our experimental evaluation on the Star Schema Benchmark shows that the hybrid execution outperforms both the conventional caching- only architecture and pushdown-only architecture by 2.2×. In the hybrid architecture, our experiments show that Weighted-LFU can outperform the baseline LFU by 37%.
more » « less
Full Text Available
Taurus: lightweight parallel logging for in-memory database management systems

https://doi.org/10.14778/3425879.3425889

Xia, Yu; Yu, Xiangyao; Pavlo, Andrew; Devadas, Srinivas (October 2020, Proceedings of the VLDB Endowment)
null (Ed.)
Existing single-stream logging schemes are unsuitable for in-memory database management systems (DBMSs) as the single log is often a performance bottleneck. To overcome this problem, we present Taurus, an efficient parallel logging scheme that uses multiple log streams, and is compatible with both data and command logging. Taurus tracks and encodes transaction dependencies using a vector of log sequence numbers (LSNs). These vectors ensure that the dependencies are fully captured in logging and correctly enforced in recovery. Our experimental evaluation with an in-memory DBMS shows that Taurus's parallel logging achieves up to 9.9X and 2.9X speedups over single-streamed data logging and command logging, respectively. It also enables the DBMS to recover up to 22.9X and 75.6X faster than these baselines for data and command logging, respectively. We also compare Taurus with two state-of-the-art parallel logging schemes and show that the DBMS achieves up to 2.8X better performance on NVMe drives and 9.2X on HDDs.
more » « less
Full Text Available

« Prev Next »

Search for: All records